perm filename CACHE.6[AM,DBL] blob sn#425720 filedate 1979-03-14 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00010 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input paper.tex
C00005 00003	\NSECP{INTRODUCTION}
C00009 00004	\NSECP{THE ASSUMPTIONS}
C00022 00005	\NSECP{THE PROBLEM}
C00028 00006	\NSECP{ABSTRACTION}
C00039 00007	\NSECP{CACHING}
C00055 00008	\NSECP{EXPECTATION-SIMPLIFIED PROCESSING}
C00062 00009	\NSECP{COGNITIVE ECONOMY REVISITED}
C00064 00010	\NNSECP{Acknowledgements}
C00065 ENDMK
C⊗;
\input paper.tex

\TITL{COGNITIVE  ECONOMY}

\vfill

\AUTHO{Douglas B. Lenat, \ Stanford University}

\AUTHO{Frederick Hayes-Roth, \ Rand Corporation}

\AUTHO{Phillip Klahr, \ Rand Corporation}

\vfill

\vskip 10pt plus 2pt

\NNSECP{ABSTRACT}
       Intelligent systems can explore only tiny subsets of their potential
       external and conceptual worlds.   They must develop efficient  forms
       of  representation,  access,   and  operation   to  increase   their
       capacities.  Some of these  forms involve abstraction, caching,  and
       expectation-simplified processing.  These capabilities in turn  can
       combine to provide extremely powerful increases in performance.  For
       example, all three can combine to simplify simulation or, one of its
       related functions, detection of surprising events.  In developing
       economic principles for modern  AI systems, we are forced to favor
       some counterintuitive ideas, policies which are not generally
       followed because their contribution to overall cognitive utility is
       not apparent.  For example, the  nonredundant  storage of properties
       in hierarchical
       inheritance nets  increases many  processing costs  while  providing
       minimal storage cost  savings.

\vfill

\eject

\NSECP{INTRODUCTION}


Every scientific theory is  constructed in a  rich context of  surrounding
theories, methods,  and standards  which determine  which experiments  are
reasonable ones to perform and which point of view to take when  interpreting
the results  -- in short, a {/it paradigm}. We feel it useful
to articulate the "hard core" of our paradigm (the assumptions our  theory
rests upon) before  presenting the  problem (Section  3) and  principal
ideas for  solution (Sections  4-6). Among our assumptions are: 
a model for intelligence
(Section 2.1),  a  model for  how  intelligent computer  programs  may  be
organized (Section 2.2),  and a  model of the  characteristics of  present
computing engines (Section 2.3).

In highly condensed form, our argument proceeds as follows:
(i) We continually face searches in more or less immense spaces; intelligence is
the ability to bring {/it appropriate} knowledge to bear, to speed up such searching.
Increasing intelligence then comprises increasing knowledge, its
organization, and the conditions for its applicability. 
How are these new discoveries made? Empirical induction
(generalizing from experimental and other observations), analogy, and the criticism
of existing theory all lead to new conjectures, new possibilities.
(ii) Intelligent systems can make the applicability of their knowledge explicit
by representing that knowledge as condition/action rules  (If {/sl situation}
then {/sl appropriate reaction}). Such pattern-directed inference systems (PDIS)
can benefit from a schema representation (frame, unit, being, script, etc.), because
this adds structure which the rules can then take advantage of.
(iii) Computers currently present us with limited cycles, cheap storage,
and expensive knowledge acquisition.
(iv) The problem: We want to develop initial systems quickly, and then have
them speed up and economize their computing, to maximize their potential.
(v) Many AI researchers {/sl cum} language designers have focused on the first
half, resulting in excellent software such as Interlisp's DWIM, File, and 
Break Packages. We therefore call attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
Our principal suggestions for meeting this challenge are:
redundant representation of knowledge at
multiple levels of
abstraction, caching of computed results,
and expectation-simplified computing.


\NSECP{THE ASSUMPTIONS}

\SSEC{Our Model of Intelligence}


Many human cognitive activities, including most of those commonly referred
to as "requiring intelligence", can be cast as searches, as explorations through
a search space, meandering toward some goal.  We call a problem-solver more
intelligent if he can efficiently work toward a solution even in the face of an
immense search space and an ill-defined goal.  
Somehow, he is imposing more structure on the problem, and using that to
search more effectively. How might he do this?  

According to our model of human intelligence, he accomplishes his task by
bringing knowledge to bear, knowledge not supplied explicitly in the problem
statement.  This knowledge can be about problem-solving in
general (e.g., how to recognize and break cultural blocks)
or about the problem's domain specifically
(e.g., which groups of atoms can usually be treated as superatoms). 
It may have been learned long ago, or generated during the
early phase of work on the problem.

This implies that a problem solver can become more effective by
(i) increasing his knowledge, (ii) better organizing his knowledge, and
(iii) refining his criteria for deciding when various pieces of knowledge
are applicable.  In terms of production (If/Then) rules, these correspond to
adding new rules, modifying the data structure in which rules are held,
and modifying the conditions (IF parts) of existing rules.

How is new knowledge discovered? One route is that of {/it abstraction}: condensing
many experiences into a heuristic which, in hindsight, we see would have
greatly aided us in the past, which would have speeded up our reaching this
state in the search. Closely related is {/it generalization}, often merely
expanding the domain of relevance of a specific bit of knowledge we already
possessed. {/it Analogy} is one of the most useful techniques; one can draw
parallels not merely to other existing facts and heuristics, but also to the
{/sl paths} which led to their discovery (e.g., even if a result in organic
chemistry does not map over into the inorganic world, the experiment which
led you to that first fact may map over into an experiment which {/it will}
reveal a useful fact in inorganic chemistry; even though the analogue of a
mathematicl theorem may be false in some new domain, the analogous proof technique
may be useful). Another path to discovery is {/it specialization} of more
general knowledge. Finally, we must mention the process of {/it hypothesis,
criticism, and improved hypothesis}, which is a common and powerful
method of spiralling
in toward precision.  In mathematics [Lakatos] and many
scientific disciplines [Musgrave & Lakatos], counterexamples
to current theories arise frequently, and their incorporation often demands a
deepening of the theory, an increase in knowledge.

Experiments  such as the AM program [ref]  support our assertion that
these methods can adequately
guide even open-ended searches for new definitions and conjectures.
But how can an intelligent system be programmed, how can such systems
be organized?


\SSEC{Our Model of Intelligent Program Organization}

The above remarks about intelligent problem solving apply equally to hardware
and wetware alike. Computer programs which are to be intelligent must
ultimately grapple with the tasks of knowledge acquisition, representation,
and refinement. We cannot provide an abolute answer to how they should be
organized, but we can posit some design constraints which have proven useful
so far.

A very general heuristic in AI programming is the following: If your program
is going to modify its own {\bf X}, then make {\bf X} as separable, clean, and explicit as
possible. In our case, {\bf X} can be instantiated as "knowledge", or as "applicability
conditions for each piece of knowledge". Thus the heuristic advises us to
represent our knowledge in a separate, clean, explicit form, say as knowledge
modules having some fixed structure, and also advises us to keep the applicability
conditions for each knowledge module separate from the rest of the knowledge it
contains. 

This naturally leads us to a pattern-directed inference system, in
which knowledge is broken into separate modules, each with an explicit set of
relevancy tests. Such systems arising in Pittsburgh may overemphasize
cleanliness (so-called pure production systems), while those arising in
California may tend to be a bit too laid back (so-called ad hoc expert systems),
but such variations should not obscure their common source of power.
The dominant PDIS architecture has knowledge broken into a set of condition/action
productions, rules of the form IF {/sl triggering situation} THEN {/sl appropriate
reaction}. 

Having a clean representation for rules means having a clean, precise, elegant
language in which to express them.  By structuring the conceptual knowledge
of the system, by partitioning each module's knowledge into several categories,
a rule can condense an entire cumbersome description into a single pointer.
The popular schematized representations of knowledge (scripts for episodes,
frames for static situations, beings for specialists, units for everything)
enable a rule like "If there are no known methods specific to finding new examples
of prime numbers, Then..."
to have its condition coded as a simple 
null-test on the To-get subslot of the Examples slot of 
the schema for Prime Numbers.
By a judicious choice of slot types and subslot types, the system builder can reduce most
triggering conditions to such quick checks on the state of various (subslots
of) slots of schemata.

Additional knowledge is required to efficiently locate all the
rules which {/it might} have their conditions satisfied in a given situation,
and also to decide which rules to actually fire (obey) among those whose
IF parts have actually triggered (been satisfied).

The location of potentially-relevant rules is facilitated by organizing the
rules into some structure.  For example, AM[ref] organized  mathematical concepts
into a generalization/specialization hierarchy, and tied each rule to its most
relevant concept. To find rules potentially relevant to concept C, AM then
needed only to look at the rules tacked onto C, C's generalizations, {\it their}
generalizations, and so on. By inheritability, any rule relevant to
judging Objects in general was (potentially) relevant to judging
an Abelian group; any technique for estimating the value of any function was
relevant to estimating the value of Ackerman's function.

This requires an explicit focusing of attention, a selection of a "current
topic" C from which the above rippling proceeds. We favor the maintenance
of a separate data structure for this purpose, an agenda of topics to consider,
of subtasks to work on. Knowledge may be brought to bear to select a topic,
then the above quest for potentially relevant rules is carried out, then they
are examined to find the truly relevant satisfied set, and finally some subset
of those are fired off.


\SSEC{Our Model of (Present) Computers}


Since we are considering the problem of building computer models of
intelligent behavior, many of our assumptions deal with the characteristics
of present-day computers. They are the symbol manipulators we use,
but were not designed for that general purpose.

[RICK:  FILL THIS SECTION OUT; 2 BEAUTIFUL PARAGRAPHS WILL DO IT. BASIC IDEA WAS:
	Computers currently present us with limited cycles, cheap storage,
	uniprocessing, and expensive knowledge acquisition.]


\NSECP{THE PROBLEM}

When we build an AI program, we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral. On the other hand,
we want our systems to run as efficiently as possible, to minimize the
temporal delays, the magnitude of the cpu time required.  Quick and easy
implementing is at odds with careful analyses leading to programs which are
optimized, which have maximized their potential.

For example, Stanford's Ray Carhart spent much of 1978 visting Edinburgh,
rewriting Dendral (its primary module, the CONGEN contstrained
generation program) efficiently for a minicomputer.  A similar attempt may be
made soon for the Mycin program.  Typical running times of cpu hours are not
extravagant in AI, even though a carefully optimized version might run in
5% of the time. This is because the researcher, forced to decide between
expressability and runtime efficiency, chooses the former.  It's better to accomplish
something inefficiently than to fail in an attempt which tried to save a few
hours of computation.


Many AI researchers {/sl cum} language designers have focused on the first
half, 
the problem of rapidly constructing a working experimental vehicle.
They have produced some
excellent software such as Interlisp's DWIM, File, and 
Break Packages[ref].  [Bobrow & Raphael 73] is an excellent discussion of
the issues involved in such efforts; see also [more recent ref; maybe by
Winograd & Bobrow on KRL?].
[SHOULD WE GO INTO ANY MORE DETAIL HERE< SUMMARIZING THIS?]


This paper is attempting to draw attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
The ideas exist which enable us to build in more "ultimate efficiency"
(e.g., the ability for a program to automatically increase its efficiency,
albeit at a nontrivial cost during the knowledge acquisition phase)
without risking the failure of the entire project.

Some earlier automatic programming systems [Darlington & Burstall, Lenat's PUP6,
others?] were designed to improve programs' efficiency, and many of those
ideas have found their way into the techniques we now describe.  
Dave Barstow and Elaine Kant have developed a pair of programs which respectively
(i) explicate a choice for representation, control, etc., and (ii) choose,
using a model of the overall structure of the program, the user's intentions,
general analysis of algorithms techniques, and other knowledge.
Earlier, Jim Low built a system [ref] which chose
appropriate data structures based upon the flavor of accesses
that were frequently made to a body of information.
These automatic programming systems
had program construction, transformation, and optimization as their primary
task; we are proposing ways to provide other kinds of AI programs
(speech understanders, theory formers, expert simulators, paraphrasers,...)
similar self-improving abilities.


Our principal suggestions for meeting this problem are:
(i) redundant representation of knowledge at
multiple levels of abstraction,
(ii) caching of computed results (storing the results of frequently-requested
searches, so they needn't be repeated over and over again),
and (iii) using predictions to filter away expected, unsurprising data.



\NSECP{ABSTRACTION}

A train accident occurs, and over two hundred people are injured; the
only large nearby hospital is called, and asked if they can handle the load.
The administrator queries his shiny new simulator, and it begins to chug
away.  It looks over the state of each current patient, each physician,
each hospital resource.  It begins simulating the arrival of the wounded,
their movement and temporary storage in hallways, etc.  After quite a while,
it shows that the hospital will be saturated and unable to care for any more
incoming patients.  The maximimum they can accept is 157 passengers.
If "quite a while" is several hours (which it very likely will be), this is
simply inappropriate. There is a pressing need for a rough answer quickly;
a {\it human} hospital administrator would answer right away, or after a
very brief survey of a few parameters, and our belief is that {\it so should
an intelligent program}.

A colonel monitoring our defense systems detects a fleet of enemy aircraft
attacking in an unexpected pattern. Can our bombers and missles meet this
new attack mode?  The colonel ill know precisely how much time is available
to answer that question, and it will be a matter of seconds or minutes, not
of hours or weeks.  A detailed simulation program will be of no use to him.
Reasoning will go on at a high level of abstraction, and an approximate answer
will be forthcoming (either from a program capable of such inference, or
-- in lieu of that -- in the colonel's head).


[what are some other possible examples to use here?  We should pick say two,
and use the same ones throughout the paper, whenever possible.]




In many real-life problem solving situations, the "goal" is not a single,
precise answer. Rather, there is a cost/benefit tradeoff between the
accuracy of the solution and the amount of resources consumed in producing
it.  As with any such tradeoff, there is typically a point of diminishing
returns.  Computer programs which are designed only to return exact answers
will then be ill-suited to the needs of the various situations in which it
will be called upon.  One area in which this is recognized in AI is game-
playing.  The concept of truncated search is fundamental in all chess programs
Yet very little of the same philosophy has permeated to other kinds of
AI programs.

[this last para. needs reworking; my analogy to truncated search may confuse
more than illuminate.]





see   ( Desirability of being able to compute rough answers cheaply
above (         Simulation

	Conceptual hierarchies
		Hier. in general: possible organizing relations
			Supervises, commands, controls, calls (as in subroutin
			Containment: part-of versus specialization-of versus
					application-of versus element-of

		Inheritability: the importance of "or any ancestor"
			The notion that the hier. is a compact way of storing
				a huge set of assertions, is fairly efficient
				vis a vis retrieving / testing for them, and
				is much much more efficient for updating them.
			This discussion should be general to all above
				organizing relns, not just genl/spec hiers.
		Make sure to distinguish our meaning for abstraction
			(redundant, generalized version) from the other
			common meaning (inducing a general case from examples)

	Heuristic Hierarchies
		Each heuristic H has a domain of relevance
		More precisely, H's relevance is a function of task posed,
			and is high for some (it domain of relevance),
			low for many more, and perhaps negative for others.
		Heurisitcs can be related to each other by genl/specialization
			[example: "If P(x) is often false, weaken P" -->
				Also an example involving sim.]
		Heuristics can be object-centered (tacked onto relevant
			concepts) and can therefore inherit THEIR relationship
			[an ex. of that..  This is "Inheritability" in AM
			sense.]



	Interpretation and planning at levels of abstraction
		Eg., rules of bomber simulation at difft levels
		(this example will ultimately be used to suggest
		 caching for simplification)
		[this ties in with the last point about heur. hier.]
		Part of this para. might be this example:

"In the short run, what would happen if we didn't take any defensive
actions in a full-scale enemy attack situation?"  A strategist can
answer this instantly, to the effect that the enemy missles and bombs
would then reach their targets, destruction would be nearly total, our
capacity for retaliation would be severely reduced, and so on. He can
say this without factoring in the individual changes in aileron
attitutde throughout the flight of each bomber.
He has a high-level "script" which specifies defensive action against an
attack, with a small note of explanation (e.g., "else destruction"), and
that's as detailed as he needs to get to answer the question.
Now consider a question like "What if we had a fleet of 200 B1 bombers
in addition to our current forces?" The strategist may again be capable
of giving a quick estimate of the effect, but it's possible that a more
detailed analysis would reveal some cusp, some nonlinearity not apparent
to him through simple extrapolation, through the application of general,
high-level rules of thumb.  One possibility is that his old, general,
high-level scenario could be used as a plan, as a set of general expectations
which guide the detailed simulator, which predict which aspects of the
world will benefit from much more detailed simulation (e.g., encounters where
our bombers hitherto failed to reach their targets), and which areas will
probably be unaffected by the change in the world (e.g., attacks on our
coast by enemy submarines).
[note that we can easily extend this discussin a bit for caching: keep
around the old simulation, at various levels, and only re-run those parts
which might be affected by the change in initial situation, and even then
only rerun it at the hihest level possible.  This latter decision is probably
always made locally: run until the results stop changing, then you know
you've expanded down one level too far, so you can back up and stop.


	Related research
		Include mention of psych. studies showing that people may
			reason at the word level, rather than more primitive
			level, because the high-level word manipulation
			process is more efficient than the manipulation of
			large, detailed structures (such as huge sem. nets).

\NSECP{CACHING}


	Modifying memory to save computed results to speed subsequent accesses
		Simpleset case: After computing F(x), remember it.
		Also important: After computing F(x), remember the PATH
			i.e., the way you computed it, the traps, the blind
			alleys, the overall type of successful approach, etc.
		Analogue in chess: first hinted at by Newell Shaw and Simon
			in their early chess paper, then elaborated by Berline
			in CAPS: the idea that you should abstract out a
			description of a path through the search tree, so
			that the remainder of the seaarch could partake of it.

	Generalization of hardware concept
		What more to say than the name, the hardware defn and uses,
		and the obvious analogy to our above proposal.
		Perhaps here is a good place for some rough calculations
			which show the extra cost and benefits in some sits.
			In which case, use the same few situations all through
			the paper as running examples.

	EURISKO types of caching, as first examples
		This means a diversion into representation of concepts as
		slots and subslots (facets and subfacets).
		Two types of slots: elementary and virtual.
		Blend them together: a cached virtual slot appears much
			like an elementary primitive slot.
		Note this is most applicable when the values are slowly-
			changing.  This is ideal for the defns. of slots.
			This enables the program to occasionally change a slot
			's definition and yet (except right after such a
			change) run as efficiently as if that capability
			were not there.
		Analogous to COMPILING.
			Both of functions and of definitions.
		Make sure it's clear that caching of defns. is not in any
			way a special feature:  any facet can be cached.
		Options involved:
			The basic idea is that a request comes in with
			some description of the cost/benefit curve
			(for each amount of resources expended and each
			degree of precision of the result, specify how
			acceptable that resource consumption / accuracy of
			solution pair is.)  One extreme of this is to provide
			an ionclad limit on resources (Boborow's resource-
			limited comoutation), and say "Do the best you can
			within these time/space/... limits."  Another extreme
			(most common in present-day programs) is to specify
			an ironclad goal criterian ("An answer which is at
			least this good, no matter how long it takes to get.")
			We are making the reader aware of the space in between
			the two extremes.
			Now that we went up to lofty heights, come back down.
			In Eurisko, we linearized this space.  We picked a
			few telling parameters of it, and said that a descrip
			was merely a list of values for these parameters.
				Time, space, user communic, degree of precis.
			Go through a couple examples in detail.

	Contrast with psychological conjectures of cognitive economy
		(e.g., Collins&Quillian, KRL, ...).  More like $HR↑2$ Plasticity
		model of storing all retrieved paths as direct links
		[I can gues what this means, but you (Rick?) should do this
		section.]

	General principles
	   Note that this is in general a heuristic rule driven decision-
		making process, and what we have specified below are just
		some of the relevant heuristics. Many of the more general,
		comonsense ones are absent, and even the ones present are
		not specified all the way down to the operational (LISP) level
		If a progam had these rules, the categories below would be
		the various kind of subslots allowable (differentiable).

	   Note that there must exist heuristics for determining the
		applicability of the following, as a function of machine
		architecture. Equivlaently: we haven't fully worked out
		the left hand sides (domains of relevance) of the rules.




	    Updating Principles
	    -------------------

	       When
		If the curr. request's description specifies that a more
			precise answer is needed than the one cached.
		Only if the resource alloc. is high enough to get a better
			or more current answer than the one cached already.
		Both of these mean: If you've gotten a better value to cache.

		Of ourse, if there is no cached answer available, update=write

		Typically: the cached value was F(x), and x changed,
			or the definition of F changed.


	       Why

		Frame problem:  the value is old, and the world (may've) chang
		A new value is computed (either intentionlly or accidentally)
			and its current-ness makes it "better" perhaps.

	       How
		  get demon traps that flag the cache as out of date
		  the user requests updating if the cache seems staleness

		Usually: discard the whole old value, store just the new one.
		If the program is very smart: When the world changes, try to
			propogate that change through the network of caches,
			changing THEM. This is much better than erasing them!
			Give an example of this in action (prob. fictitious).



	       Where

	       In what form

	       What

	       When not to
		This should be merged with "When to"
	       How to
		How is this different from "How", above?






	    Storage Principles
	    ------------------

	       When
		  Every time you have to call a lower order function to eval. it
		    & it took quite a while.
		  You've caled it before, recently & the value didn't change.
		If the amount of processing to get it was very high.
			Esp: after a lucky accident, where a repeat might
			be very costly or even fail entirely to duplicate
			your recent success at getting a solution.
		If (freq of usage)x(avg cost to recompute)/(freq of change)
			is high, then it pays off.




	       Why
		  Cost of recomputing vs. cost of storage.
		  Context of subsequent cals is similar enough (e.g.l, the same
		    arguments will come up again.
		Related to "When"; namely: in those situations, it's definitel
			cost-effective to store a cache.
		Analogy to hardware utility.

	       How
		  Called functions might suggest how to cache their value in higher
		    calling caches (e.g., my value changes often so cache my defn.).
		  Cache should be transparent & discardable (should be able to throw
		    them all away if space needed).
		Make sure to store a description of the CALL which led to
			this value being computed.  I.e., make sure you
			store the amount of resources brought to bear, the
			time (to tell freshness later on) of day, whether
			any other caches were used in this search, etc.
		If possible, predict (criteria, time-limit, etc.) under
			which this will no longer be current (eg, depends
			on defn of Y, depends on there being no exs. of Z...)

	       Where
		The value to be cached should hav a precise storage place.
		One extreme is an assertional database, with NO structure.
		WE PREFER TO ADD slot and subslot structuring, with some
		domain-specific tinkering of those two sets (esp the former).
		Thus an extreme example of prime numbers should be stored
		on a slot called extreme-examples, on a schema called primes.
		At access time, we look in the precise spot(s) where X would
		be, and if it isn't there, we compute and cache it right there

	       In what form
		  value     )  what level of abstraction (partially evaluated
		  expression)    symbolic expression)

		  Stack previous values to enable you to tell if they're changing.

	       What
		  You store a flag saying you've been here before.
		  When it was computed.
		  How much effort was expended on it, down to what levels of
		    algorithms, with what around caches incorporated.
		  Certainty of the result.

	       When not to
		  The value changes too frequently.
		The value is so critical that the cost-benefit criteria
			(eg the resource limits are enormous) always favor
			recomputation rather than chancing a blunder.
		  The function evaluates as fast as the caching mechanism itself
		  Space is too tight
		In general, these are all related to When (above).

	       How to eliminate caches
		  Space tight--> eliminate last used caches (last referenced)
		If freq. of use is negligible
		Just reverse the formula in "When" (above), to see if it
			is no longer cost-effective to keep the cache.
		Remeber that a value will occupy varying amts. of space.
			AM, eg, eliminated large cached values first.



	It's also a good place, here, to present the progression of caching:
		This shows how "heuristics" are caches of experience, too.
ACCESS==> CALCULATE ==> DEDUCE ==> INDUCE BY OPEN=ENDED SEARCH
   \       /      \       /  \      /
     cache          thm, alg   heur rule

[eg, a theorem or algorithm is one way to simply a deduction, to reduce
it to mere calculation.  Heuristics are capable of reducing discovery tasks
to mere symbolic manipultion, to mere deductive computation (as in AM); and
caches are capbable of educing calculations to mere lookups]


MAYBE make some sense of this and put it in:
Save redundant genl/spec links; Save values of virtual slots; Save multiple
instantiations of the same rule at different levels of abstraction;
record consensations of many experiences as new heuristic rules.
[tradeoffs: size of KB;  updating problems; precision in reasoning one's
way through delicate inference chains effiiently; synthetic reasoning;
rule interaction]

Mention the psych. studies that show that people do develop direct association
between words that are co-retrieved often. Thus dog and animal are closer
than dog and mammal, even tho dog-->mammal--->animal in the hierarchy.




\NSECP{EXPECTATION-SIMPLIFIED PROCESSING}

	Central notion:  reserve your computing for opportunities to realize
		potential for expanding knowledge
		Ie: be willing and able to add new facts,
			but be eager to add surprising new facts.
		Equivalently: by being guided by expectations (scripts, frames
			you will find yourself zipping through 90% of what
			comes in from he world, and spending most of your
			processing time on the remaining puzzles, anomalies.
	You may decide how much to expend re-confirming the expected
		You may decide how much time to spend deciding how much time..
	Reductions realizable through expectations:
		Perceptual set:  see what you expect with less effort
		Surprise:  heighten sensitivity to data inconsistent with
			expectations
			Perhaps mention analogy to frog's visual system.
		Predict and prepare
			This includes a whole continuum, growing more
			flexible, more descriptive (less scalar), dynamic:
		  > Store a value, by binding a variable to it, so you can
			later refer to it (retrieve it) by name.
		  > Cache the value of a function-call, so if you ever want
			to find F(x), first you look to see if that value is
			already computed and stored away, else you compute it.
			Note that this is much like hashing, like associative
			memory: the idea that you have a place where F(x)
			should be (the hash of the string PARENTS(FRED), eg)
			and you look there first. This is just a genl. of
			the previous ">" about binding a variable to a value.
		   > After finding F(x), store the information that would
			enable you to recompute it rapidly.  Hopefully, this
			will also help you compute F(y) (either for all y
			or at least for some y's similar to x), and possibly
			will even help you compute G(x), where G is similar
			to F (and, much more rarely, G(y)).
			The idea here is to cache an algorithm, or at least
			constraints on a search, for finding F(x).
			One very special case of this is normal compiling:
			the compiled code for F is just a more efficient way
			of finding values of F. As with our above remarks, one
			often finds his code worked interpreted but not
			compiled. This is considered "wrong" (ie a bug in the
			compiler), but is much more in line with our general
			idea of mere assistance, which is sometimes quite
			useful but may sometimes not help you at all.
		   > Synthesize and store away a partial mode of the world.
			This may include a (sbu)network of frames, which
			together capture enough about the situation that
			in the future its relevance will be recognized, and
			some aspects of this situation will help guide future
			processing on some problem (which is recognized as
			similar).  This is much like analogy, like real
			"learning" (storing away experiences).

	What mechanisms are implicated?
		Caching
		  This should encompass also normal Compiling of subroutines.
		PDMs (as triggers or demons)
		  The analogy: you hear someone yell "Fire!" in a theater.
		  Also: mention the basis for much of humor: the punchline.
			The intentional misleading initially (wrong LHS)
			An unexpected reaction (wrong RHS)
			This may even provide a start at a taxonomy of
				humor, one which coud be operationalized.
	Relevance to learning
		Confirm or disconfirm predictors
		This requires setting up PDMs to fire on dis/confirmation
		Yes, the idea is that when an expectation is not met, we
			also (have some chance of) modifying the culprit
			rule(s) that made that false prediction (and/or
			adding new rule(s) which would make the right one.
			THis is v. similar to Mycin's task, and the TEIR.
			approach is worth mentioning.
		This ties learning in very closely to scientific theory
			formation (at least the theories of knowledge which
			are activist, revolutionary conventionalist activist,
			methodological falsificationist, sophisticated.
			[those are the nodes descending in a genl/spec hier
			of theories of knowledge]



\NSECP{COGNITIVE ECONOMY REVISITED}

[let's leave this section until last, till the others are written out]

	Sample problem:  using a world model (simulator) to answer questions
		(e.g., what'd happen if 100 bombers went in there?)
		Representation of this knowledge as PDMs at difft levels of abstn
		Ability to generalize and cache results at one level at the
			next higher level,
				e.g. either as numerical tables, stat. distns, or
				symbolic expressions
		Ability to answer some questions appropriately for the requestor
			at a high level of abstraction

	KB Design
		One good reason to use inheritanc is to speed knowledge
			implementation, not computing performance
		Using the system should result in its speedup
		Storage should be cheap
	Machine architecture
		PDI should be cheap
		PDMs should be scheduled with variable resources and
			should be able to spend effort accordingly
		How could propagation of changes be made efficient?



\NNSECP{Acknowledgements}

We wish to thank...

\vfill\end